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Abstract. The entropy rates of the Wright-Fisher process, the Moran process, and gen- 
erahzations are computed and used to compare these processes and their dependence on 
standard evolutionary parameters. Entropy rates are measures of the variation dependent 
on both short-run and long-run behavior, and allow the relationships between mutation, 
selection, and population size to be examined. Bounds for the entropy rate are given for the 
Moran process (independent of population size) and for the Wright-Fisher process (bounded 
for fixed population size). A generational Moran process is also presented for comparison to 
the Wright-Fisher Process. Results include analytic results and computational extensions. 



Populations of replicating entities are subject to a variety of selective, stochastic, and 
diversifying processes. These processes can act on different time scales, ranging from short- 
term stochastic drift in small populations to long-term selective processes that slowly lead to 
the fixation of one trait in a population. Many such processes have been studied extensively, 
including founder and bottleneck effects, selective stability, and mutation-selection balances 
I [2j and depend significantly on certain population parameters, such as population size |3j 
4 [5] [g]. Accordingly, variation in population dynamics arises from both short-term and 
ong-term effects. 

Why study the entropy rate of models of evolutionary processes? Entropy rates mea- 
sure the inherent randomness of a process due to both short-term and long-term dynamics. 
Moreover, by assigning a value to each particular process we gain the ability to not only 
compare processes but also a way to explore the interactions of the fundamental processes 
of population biology: natural selection (through the fitness landscape), genetic drift (via 
the population size), and diversifying processes such as mutation. We can also study the 
fundamental processes in evolutionary dynamics pairwise by considering several limits: to 
eliminate drift, we can let the population size oc, to eliminate mutation, we can let the 
mutation probability /x ^ 0, and to eliminate selection, we can use a uniform fitness land- 
scape. Entropy rates reveal that there is significant long-run variation in finite population 
dynamics even in cases that are thought of as evolutionarily stable^ and also situations in 
which the "most inherently random" behavior occurs for relatively large populations rather 
than from the stochastic effects of small populations. 

1.1. The Moran Process and Generalizations. The Moran process is a birth-death 
process that describes natural selection in finite populations |7| and has many applications 
[Sj [9]. In each round of the process, an individual is chosen proportionally to fitness to 
reproduce and an individual is chosen at random to be replaced. The classical Moran Process 
was generalized to include mutation and frequency dependent fitness by Fudenberg et al [lO]. 




1. Introduction 
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Let us consider a slight generalization to include possibly variable mutation rates that depend 
on the population state. For a population of size A^, let the population be divided into two 
types A and 6, with the number of A individuals denoted by i and the number of B individuals 
by N — i. A pair (i, N — i) with < i < is a population state. Let and fs be the 
fitness of the types A and B respectively, possibly depending on the population state (i.e. is 
frequency-dependent). The Moran process has transition probabilities 

lfA{i){l - f^AB{l)) + {N- l)fB^)flBA{N -l)N-i 



(1) Ti^i-i 



lfAil) + {N-l)fB{i) N 

ifA{i)f^ABii) + {N- i)fB{i){l - f^BA{N - i)) i 



tfA{t) + {N-t)fB{^ iV' 

Ti—j-i = 1 Tj—^j-i-i Tj—^j-i, 

where (iab and fisA are mutation probabilities that may depend on the state, and the fitness 
landscape is given by 

a{i - 1) + b{N - i) 



fA{i) = 
fB{i) = 



for a game matrix defined by 



A^- 1 
ci + d{N 
A^- 1 



a b 
c d 



In accordance with and [10], further assume that Tq^i fJ^AB^ Tq^q = 1 — fiAB^ 
= I^BA) and T^^n = 1 — Mba so that the Markov process has a stationary dis- 
tribution and no absorbing states. We will consider two mutation regimes: the boundary 
regime defined by fiABii) = = fiBAii) for i O^N and the uniform regime defined by 
I^AB{i) = I^AB and fiBAii) = f^BA for all i (so that the mutation rates are constant). The 
uniform regime is a more realistic model of mutation whereas the boundary regime is the 
minimal amount of mutation required to ensure a stationary distribution for the Moran pro- 
cess in most of the cases we will consider. If ifA{i) = {N — i)fB{i) for all i 7^ 0, A^, then the 
two regimes are equivalent. 

L2. The Wright-Fisher Process. In contrast to the Moran process, which models a pop- 
ulation in terms of individual birth-death events, the Wright-Fisher process is a generational 
model of evolution |12| |13|. Each successive generation is formed by sampling, proportion- 
ally to fitness, the current generation. Define the Wright-Fisher Process with mutation for 
evolutionary games by the following transition probabilities: 

;N\ (ifA{i){l - liAB^i)) + (iV - l)fB{i)liBA{N - I 



J J V ^fAi^ + iN-^)fBi^) 

tfAi^^iAsit) + iN- - ^iBAiN - ^)) ^ ^"^^ 



^/^(^) + (iV-^)/B(^) 

This is a slight generalization of the basic process as given by Imhof and Nowak [12] to include 
mutation, though we will not consider parameters for intensity of selection. In contrast to the 
Moran process, the Wright-Fisher process is not tridiagonal, rather every state is accessible 



from every other state, so long as the fitness landscape is non-zero. For more on both the 
Moran process and the Wright-Fisher process see |14|. 

1.3. Entropy Rate. A fundamental tool in information theory, probability, and statistics is 
the Shannon entropy of a probability distribution [15] . For a discrete probability distribution 
P = {PotPi^ ' ' ' )Pn)) the Shannon entropy (or simply entropy) is 

n 

where pi log pi = if Pi = 0. The meaning of the entropy of a probability distribution 
is often described as a measure of uncertainty or information content. The entropy rate 
of a stationary Markov process is an information-theoretic quantity that characterizes the 
inherent randomness of the process [16] [l^ , and plays a similar role as the Shannon entropy. 
To each state i of a Markov process P there is a probability distribution = (T^^o, • • • , ^i^n) 
for the transition probabilities out of the state. We refer to the entropies of these transition 
probability distributions as the transition entropies H(Ti). The mean of the transition 
entropies taken with respect to the stationary distribution s = (sq, . . . , s^) of the Markov 
process is the entropy rate: 

n n 

n 
i=0 

The stationary distribution is a description of the long term behavior of a Markov process, 
and so the entropy can be similarly interpreted as a measure of the uncertainty, inherent 
randomness, or information content of the long run behavior of the process. The entropy 
rate is aflFected by the likelihood that the process occupies a particular state and the entropy 
of the behavior of the state. In other words, the entropy rate refiects both the long term 
variance in population states (the stationary distribution) and the short term variance due 
to the entropy of the transition probabilities at the states represented significantly in the 
stationary distribution. 

Generally for a Markov process the transition probabilities are known a priori; the sta- 
tionary distribution, however, can be difficult to describe analytically, depending on the 
complexity of the transition probabilities. Since the maximal Shannon entropy for a discrete 
distribution on n states is logn, the theoretical maximum entropy rate for a Markov process 
is also logn. For a tridiagonal process (e.g. the Moran process), the maximum entropy rate is 
log 3. For the Wright-Fisher process, the theoretical maximum entropy is log (A^ + 1), where 
N is the population size, because there are N + 1 states (and so typically N + 1 non-zero 
values in each transition distribution). 

Stationary distributions for the Moran process and some recently-studied generalizations 
are given by Claussen and Traulsen in |11| (see also |18|), the computation of which we brieffy 
discuss. The components Si of the stationary distribution satisfy 5^T^^^+i = and 
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Figure 1. Top: Transition Probabilities for = 100, /llab = 0.001, /j^ba = 
0.01, uniform mutations, and game matrix given bya = 2, 6 = 4, c = 3, and 
d = 2. Red, green, and blue correspond to Ti^i_i^Ti^i^ T^^^+i for each state i. 
Middle: Transition entropies. Bottom: stationary distribution. The entropy 
rate in this example is approximately H = 0.9472. Stationary distributions 
are not generally Gaussian [llj . 

where Sq can be obtained from the normalization Si = 1: 



N 3-1 



(3) 



T 

■J- 1 



T 



This particular formulation relies on the fact that the processes are tridiagonal with only 
transitions between neighboring states being nonzero. From a computational perspective, 
for any concrete values of the various parameters of these processes, the stationary distribu- 
tion can be computed efficiently even for relatively large populations using a sparse matrix 
approach, and useful analytic forms can be given in some cases. Finally, note that nonzero 
mutation probabilities on the boundary states i = 0, are required so that the Markov pro- 
cess has a unique stationary distribution. In other words, we must prevent these states from 
being absorbing, and we can recover the behavior of processes without mutation by letting 
/X tend to zero. Analytic solutions for some examples of the Moran process on evolutionary 
games in the boundary regime are given in fll | . See Figure [l] for an example of a Moran 
process with associated transition entropies and the stationary distribution. 

For the Moran process we will consider several fundamental examples and give a variety 
of analytical results. Calculation of the stationary state for the Wright-Fisher process is 
not as easy, computationally or analytically, though see |12| for some results. We will give 

4 



computational results for comparison in some cases for the Wright-Fisher process, along with 
some analytical results and conjectures. 

Finally, we will make use of one additional information-theoretic quantity called the 
KuUback-Leibler divergence |19| (or KL-divergence), which is a measure of "distance" be- 
tween probability distributions: 

(4) Dkl {p\\q) = "^Pilogpi - Pi log qi 

i 

This divergence is not a true distance function in the sense of a metric (it does not satisfy the 
triangle inequality) ; nevertheless it is a widely used measure of difference between probability 
distributions. See \T6\ and |15| for more on any of the mentioned information theory topics. 

1.4. n-fold Moran process. Since the Wright-Fisher process is a generational process, 
replacing the entire population in each iteration, and the Moran process is atomic process, 
they exhibit very different behaviors. Consider the following process, which will be referred 
to as the n-fold Moran process. Define each step of the process as k steps of the Moran 
process, so that n = 1 is the Moran process, and n = where is the population size, 
yields a generational processes derived from the Moran process. 

The transition probabilities of the n-fold process can be computed directly from the tran- 
sition matrix of the Moran process (equation [T]) by simply computing the n-th power of 
the transition matrix. Since the transition matrix of the Moran process is tridiagonal, each 
iterate will have two more nonzero diagonals corresponding to the two new population states 
accessible in each step of the compressed process. Moreover, since the stationary distribution 
of a Markov chain can be obtained by the rows of the matrix defined by 

s = lim = lim (r^)"^, 

the stationary distributions of the n-fold Moran process are the same for all fc, given a fixed 
transition matrix T. The entries of the transition matrix, T^_^^/ correspond to the probability 
of moving from population state (a, N — a) to population state (a^, — a') in exactly n steps 
of the Moran process. 

2. Results 

2.1. Neutral Evolution: Moran Process. First consider a population where both types 
have equal and constant fitness, i.e. for the game matrix of all ones, or more generally, when 
/a(0 = /^(O ^- Figure [2] shows the entropy rate as a function of the population size 

for various // for the Moran process. In the case = 2, it is easy to show that 

where i?((/i, 1 — m)) is the binary entropy function (the Shannon entropy for the distribution 
(/i, 1 — //)). In this special case, the two mutation regimes yield the same process, which is 
typically not true for N > 2. Though this is a very simple case, it illustrates some common 
features of these processes. For instance, as /x ^ 0, the entropy rate ^ 0, a fact which 
holds for a wide variety of such processes. (We will also see that the constant | log 2 plays 
a special role.) 
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Figure 2. Entropy Rate vs. Population Size for /llab = I^ba ^ 
{0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0001} (top to bottom) with a neutral fitness 
landscape. Left: Mutations only at the boundary states (boundary regime); 
the entropy rate eventually approaches zero as ^ oc. Right: Mutations for 
all states (uniform regime); the entropy rate approaches 3/2 log 2 as ^ oo. 



Fudenberg et al. show in [Toj that if = ijlba/ I^ab is fixed along with the population size 
A^, then for the uniform mutation regime the stationary distribution of the process converges 
to 

f kpA Q Q \ 

\kpA + Pb' '"*' ' kpA + PbJ ' 
where pa and ps are the fixation probabilities of the types A and B respectively when the 
population starts with a single individual of the type respectively. The fixation probability 
depends on the fitness landscape, which is not necessarily neutral. This is an essential 
ingredient for the following result (all proofs in appendix). 

Theorem 1. Let pab = M psA = kp. For the Moran process (Equations^ with the 
uniform mutation regime and otherwise arbitrary parameters, lim^^o^(^) = 0. 

Though simple to prove given the result of |10|, this theorem embodies an important fact 
about mutation in evolutionary processes. In this case the entropy rate refiects the fact that 
in the absence of mutation, the long run behavior of the population is fixation on one of two 
types, and the inherent randomness of the population dynamics is eliminated. The same 
limit holds for the boundary regime as well. 

Theorem 2. For the boundary mutation regime and assumptions otherwise the same as in 
Theorem 1, lim^^o^(^) 0. 

2.2. Large Populations. It may be tempting intuitively to think that for large population 
sizes that the entropy rate also tends to zero since an infinitely large population should not 
be subject to evolutionary drift, or otherwise have reduced variance in the viable long term 
states. Figure [2] suggests that this is not so in the uniform mutation regime. Indeed, the 
entropy rate need not vanish in the large population limit, as we will see for the neutral 
landscape, but for the same landscape with the boundary regime, the entropy rate does 
vanish. Consider the neutral landscape with the boundary mutation regime for arbitrary A^ 
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and /X. A straightforward calculation shows that 



(5) ^o = «iv= (2 + ^iV25]^^^ 



N-1 



(6) 2 + /.A^^E7nv^ 



-t{N-t)J j{N-j) 

As expected from Theorem 2, it is still the case that as //^O, Sq = Sn 1/2 and Sj 
for the boundary mutation regime. The summation J^fJi^ i(N-i) ^ "^^n-i/N ^ 2log N/N ^ 
where hn is the nth harmonic number. For large and fixed /x, ^ for all i. However, 
the Pi do not converge to zero at the same rates asymptotically. For i 0^ N ^ pi 1/log A 
near the boundaries and as pi ~ l/(Alog A) when i ^ A/2, i = 0, A. Nevertheless, it is the 
case that H{P) ^ as A ^ oc, which can be shown with a tedious but direct calculation 
using Sterling's approximation and the fact that hn ^ logn. 

For the uniform mutation regime, the entropy rate does not approach zero as A ^ oc. In 
this case, the interior probabilities are larger, with i ^ A/2 being the maximum. There is a 
significant contribution to the entropy rate from this state (and nearby states) because the 
transition probabilities at this state are approximately = (1/4, 1/2, 1/4) for large A, so 
there is a contribution of H{Ti) = 3/2 log 2 to the entropy rate (weighted by the stationary 
distribution). For fixed /x and large A^, the entropy rate converges to this value. As an 
illustration, consider the following example for the uniform mutation case with /i = 1/2. 
Then the stationary distribution is 



which is maximal at i = A/2, rather than at i = 0, A for the same process with only 
boundary mutations. Hence these processes can have very different stationary distributions 
for fixed /i and A despite their similarity in definition. The entropy rate is 

HiP) = E ^ (^) ^ i ^) ' 

which approaches 3/2 log 2 as A ^ oc. 

For this example, as a function of A the entropy rate is strictly increasing, since the 
stationary distribution is concentrating on the center where the entropy of the transition 
probabilities is largest. The limit holds for all fixed /x for the neutral landscape. Equations 
j3j and j2j imply that Sj = Sn-j for all j and that Sq < Si < • • • < ^^k^ = > • • • > 

sn-i > Sn- Then note that for i < A/2, pi^i/pi has higher order dependence on A as 
i approaches the central state(s). Similarly for i > A/2, so the stationary distribution is 
increasing concentrated on the central state(s) as A ^ oc, giving an increasing entropy rate. 

2.3. Asymmetric mutation probabilities. So far we have only considered explicit ex- 
amples where /llab = M = Mba- While all proper choices of /llab and /llba give a maxi- 
mum entropy rate as A varies, in the case of a neutral landscape, the maximum is largest 
when the mutation rates are equal. This is simply because for unequal mutation rates, the 
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stationary distribution is no longer concentrated on the states with the largest transition 
entropies. More precisely, in Equations [2] and [3| the factors corresponding to Ti^o = Mab 
and l/T/v^7v-i = ^/i^ba no longer cancel, which has the effect of shifting the stationary 
distribution toward one of the boundary states depending on the value of /c = ijlba/ I^ab- It 
is possible to solve for the stationary distribution in the boundary regime for the neutral 
landscape as above (compare to Equations ([6])): 



N-l 



_ I^abN 



j{N-j) 

So 

k 



Although the analytic calculation is messier for the uniform mutation regime, numerical 
computations indicate the unequal mutation probabilities give the same tendency to shift 
the stationary distribution away from the central states (where the transition entropies are 
larger) . 

The two mutation regimes both have their merits: the boundary regime [TT] is the simplest 
approach that guarantees the existence of a stationary distribution for most game matrices 
and is typically easier to generate analytic results for. The uniform mutation regime |10| is 
perhaps a more realistic model of mutation, but has more complex transition probabilities. 
In any case, the resulting processes are of different character in certain parameter ranges, 
particularly in their large population behavior. Figure [3] shows the KL-divergence between 
the stationary states of the uniform and boundary regimes for the neutral landscape for a 
variety of parameters. 

2.4. Maximum Entropy Rate for the Moran process. An obvious question resulting 
from this section is simply: what is the maximum entropy rate any Moran process can 
achieve? The maximum theoretical value is log 3, and the entropy rate is limited by the 
maximum transition entropy. Suppose that for some i that T^^^+i = T^^^_i. Then it is 
a simple algebraic exercise to show, for boundary mutations, that /a(0 = /^(O the 
transition distribution is = ^^~^2^'^ ^ '^^^2"^^ ^7 which is maximal and equal to 

3/2 log 2 when i = N/2. (Also it is true that if /a(0 = fB{i)^ the same distribution holds.) 
So it is not possible for the transition distribution to be (1/3, 1/3, 1/3), which would produce 
that maximum value of log 3. Similarly equating other transitions leads to entropies less than 
3/2 log 2. While it may seem possible to contrive the transitions to whatever distribution 
one desires, once one specifies the four variables needed for any one of the transitions, the 
other two are determined. In fact, 3/2 log 2 is the maximum possible value for any mutation 
regime, population size, and fitness landscape, so long as a unique stationary distribution 
exists (proof in Appendix). This means that the neutral landscape achieves the largest 
possible entropy rate, and so verifies the intuition that it should have the largest inherent 
uncertainty in long run population behavior. 
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Figure 3. KL-divergence (equation [4]) for the stationary distributions of 
Moran processes for the neutral fitness landscape computed with the bound- 
ary regime and the uniform regime. For small values of /x (horizontal axis) 
relative to (vertical axis), i.e. N ji << 1, the distance between the station- 
ary distributions is small, in accordance with Theorems 1 and 2. For larger 
population sizes and fixed /x, the stationary distributions diverge substantially. 
The hyperbolic curves of constancy are given by C = N for various constants 
C. 

3. Neutral Evolution: Wright-Fisher Process 

Stationary distributions for the Wright-Fisher process are difficult to compute analytically 
in general. We can develop bounds for some important special cases. First, notice that for any 
state i 7^ 0, A^, the transition distribution Ti = (T^^o, • • • , Ti^^) is a binomial distribution. 
Although a convenient closed form for the entropy of a binomial distribution does not exist, 
there is a useful approximation given by the normal approximation to the binomial (the 
de-Moivre-Laplace central limit theorem). Given a binomial distribution with probability p 
and A^ trials, the entropy is approximately 



H (binomial(A^,p)) = - log {2TieNp{l - p)) + O 




For fixed A^, the maximum value occurs when p = 1/2. For states i = and i = the 
distributions are just (1 — /i, /i, 0, . . . , 0) and (0, . . . , 0, /x, 1 — /x). 

To compare to the Moran process, let us consider some similar scenarios to the previous 
section. Consider a very special case where the process is governed by uniform mutations 
with /X = /LiAB = l-i^BA = i/"^ (for any fitness landscape). Then the transition probabilities are 
given by Ti^j = (^)2~^ and so we have a binomial distribution with p = 1/2 for all i 0^ N. 
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Figure 4. Scaled Entropy Rate vs. Population Size for /llab = I^ba ^ 
{0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0001} (top to bottom) with a neutral fitness 
landscape for the Wright-Fisher process. Entropy rates are divided by logA^. 
Left: Mutations only at the boundary states (boundary regime); the entropy 
rate approaches a value less than 1/2 as ^ oc. Right: Mutations for all 
states (uniform regime); the entropy rates approach 1/2 as ^ oc. Compare 
to Figure [2j 

These distributions are largest at the central state (s) and decrease monotonically away from 
the center. For N ji >> 1 the stationary distribution is concentrated on the central states, 
and the entropy rate is 

1 /7rp\ 1 

(8) //(P)^_log(^_j+-logAr 

which is less than the theoretical maximum of log A^, but still unbounded in A^. Computa- 
tional results verify that this formula is an excellent approximation of the entropy rate even 
for small A^ 15. This is the maximum attainable entropy rate for the Wright-Fisher pro- 
cess since it is the maximum possible entropy for a binomial distribution. For the boundary 
mutation regime, the entropy rate appears to be increasing for fixed /x as A^ increases, but 
bounded by a smaller value. See Figure [4] for plots of entropy rates for various ji and A^ for 
the neutral landscape. 

For the neutral fitness landscape but with boundary mutations only, the transition proba- 
bility distributions are given by a binomial distribution on p = i/N . While at the central 
point I = N/2 the transition entropy is the same as for the uniform mutation case, the 
entropies are decreasing away from the central point. Moreover, the stationary distribution 
is concentrated at the boundary states i = and i = and so the entropy rate is much 
smaller. For fixed A^, computations indicate the stationary distribution approaches 1/2 at 
the boundary states and the entropy rate tends to the binary entropy i/((/x, 1 — /x)) plus 
contributions from the boundary adjacent states, which approaches zero as // ^ 0. This 
also appears to be the case for fixed A^ and /x ^ in the uniform mutation case as well. 
This is because as for N/a « 1, the transition distributions approach binomials with 
p ^ i/N^ which skews the activity of the process toward the boundary states in the case of 
uniform mutations. For boundary mutations, small fi means that the boundary states are 
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more absorptive, and so the stationary distribution has weight concentrated at the bound- 
aries. Hence just as for the Moran process, the behavior is very similar for smaU /x for the 
different mutation regimes, and the entropy rate goes to zero as // does. The analogous plot 
from Figure [3] for the Wright-Fisher process is nearly identical in shape (but not magnitude), 
so we will omit it. 

We end this section with the following conjectures. A suitable theorem for the stationary 
distribution analogous to the theorem from |10| used for Theorem 1 in the previous section 
would partially establish the second conjecture. Note that the upper bound above (Equation 
[S]) is not among the conjectures as the discussion contains sufficient proof. 

3.1. Conjectures. Based on computation evidence and comparison with the Moran process, 
we state the following conjectures for the entropy rate of the Wright-Fisher Process. 

Conjecture. (1) For the boundary mutation regime with /llab = M = Mba neutral 
landscape^ there exists a constant < 1/2 such that limAr^oo H{P)/ log N = C^. 

(2) Let jiAB = M ^^d, jiBA = kji. For both the uniform and boundary mutation regimes, 

4. Constant Fitness: Moran Process 

Now we turn back to the Moran process and investigate non-neutral fitness landscapes. 
Consider a population in which one type has constant relative fitness r, i.e. for the game 
matrix a = b = r and c = = 1, corresponding to the classical Moran process. Figure 
[5] shows the relationship between and r for various /x > 0. In particular, it is clear 
from the heatmaps that the maximum values for a given occur for r = 1 (i.e. tracing 
horizontally along any of the heatmaps). Moreover, the entropy rates seem to behave the 
same as the neutral landscape for large for both mutation regimes. The behavior for small 
fi is governed by Theorems 1 and 2. 
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Figure 5. Entropy rate heatmaps for the Moran process, r G [0.6, 1.4] in the horizontal axis, G [2, 150] on 
the vertical axis. Top row: Boundary mutation regime. Bottom Row: Uniform mutation regime. Though the 
plots look symmetric horizontally about r = 1, they are not precisely so. Colorbars are not consistent across 
plots (for additional resolution). Entropy rates are generally smaller as /x decreases (left to right). For the top 
row, the entropy rate is eventually decreasing as increases; for the bottom row, the entropy rate increases as 
increases. 
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Figure 6. Mutation-drift balance: (Vertical) Population size for which 
the entropy rate is maximal versus the relative fitness r for various 
fixed values of /x (boundary mutations). From Bottom to Top: /x = 
0.05,0.01,0.005,0.003,0.001. The maximum for r = 1 appears to occur when 
N ji ^ 1. The maximum entropy rate may occur for a large population size 
even though the entropy rate tends to zero as the population size gets large. 
These curves correspond to the maximum values in Figure [2| 

For the boundary mutation case, we have that there is evidently least one local maximum 
for the entropy rate as the population varies (see Figure [2]) . This appears to be true more 
generally. Figure [6] plots the population size that maximizes the entropy rate versus r for 
various mutation rates. Although drift typically dominates evolution for small populations 
leading such populations to be thought of as "more random", the inherent randomness 
measured by the entropy rate can be higher for larger populations. This is because higher 
entropy rate comes from a balance of mutation, selection, and drift that cannot simply be 
reduced to population size. Indeed, small populations can be subject to so much stochastic 
variation from drift that they become more predictable in their long run behavior and hence 
have smaller entropy rates. 

4.1. Constant Fitness: Wright-Fisher Process. Computational results for the Wright- 
Fisher process indicate similar behavior for constant fitness landscapes. Figure[7|has heatmaps 
for similar parameters as Figure [5] for the Wright-Fisher process. Interestingly, the Wright- 
Fisher process appears to be more tightly clustered to r = 1 for the boundary mutation, and 
varies more sharply for uniform mutation. 
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Figure 7. Entropy rate heatmaps for the Wright-Fisher process, r G [0.6, 1.4] in the horizontal axis, G 
[2, 100] on the vertical axis. Top row: Boundary mutation regime. Bottom Row: Uniform mutation regime. 
Though the plots look symmetric horizontally about r = 1, they are not precisely so. Colorbars are not 
consistent across plots (for additional resolution). Entropy rates are generally smaller as /i decreases (left to 
right). For the top row, the entropy rate is eventually decreasing as increases; for the bottom row, the 
entropy rate increases as increases. 




Figure 8. Scaled Entropy Rate vs. Population Size for /llab = I^ba ^ 
{0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0001} (top to bottom) with a neutral fitness 
landscape for the n-fold Moran process. Entropy rates are divided by logA^. 
Left: Mutations only at the boundary states (boundary regime). Right: Mu- 
tations for all states (uniform regime). Compare to Figures [2| and [il 



5. Entropy Rates of n-POLD Moran Process 

Transition probabilities of the n-fold Moran processes are typically not binomial distri- 
butions, even in the generational k = N case. This is because the individuals are not 
necessarily chosen from the same population state like the individuals in the Wright-Fisher 
process. Depending on the values of the parameters, the entropy rates of the n-fold Moran 
process can be larger or smaller than those of the Wright-Fisher process for large k with all 
other parameters the same. The same is true for n-fold processes for diflFerent values of n 
but other parameters equal, even n = 1 versus n = 2. Numerical computations indicate this 
is heavily dependent on the value of the mutation rate. 

Once again the neutral fitness landscape appears to give the maximum value of the entropy 
rate. This simply because values of r 7^ 1 will lead to the stationary distribution favoring 
one fixation state over the other, and lead to less spread out distributions. See Figure 
[9] and compare to Figures [5] and [7| The entropy rates for the A^-fold Moran processes are 
qualitatively similar to both the Moran process and the Wright-Fisher process. For n ^ 
the entropy rates are very similar to those in Figure |9j 



15 




(a) /X = 0.2 (b) jj. = 0.04 (c) jj. = 0.005 



Figure 9. Entropy rate heatmaps for the A^-fold Moran process, r G [0.4, 1.6] in the horizontal axis, G 
[2, 100] on the vertical axis. Top row: Boundary mutation regime. Bottom Row: Uniform mutation regime. 
Colorbars are not consistent across plots (for additional resolution). Entropy rates are generally smaller as fi 
decreases (left to right). For the top row, the entropy rate is eventually decreasing as increases; for the 
bottom row, the entropy rate increases as increases. Compare to Figures [5] and [7] 



6. Other Common Game Matrices 



There are three standard fitness landscapes generated by 2x2 game matrices [20] 21^ 
See ^1 for discussions of the stationary distribution for each case. Let us consider a game 
commonly referred to as the Hawk-Dove or Anti-coordination game, which has an interior 
evolutionarily stable state for continuous dynamics. The game is given by the game matrix 
a = < 6 = c, and in the case where the game matrix isa = = rf, 6=l = c and for both 
mutation regimes the stationary distribution is given by 



So Sn 



2 + 2/x(2^ - 2) 

2/x /"N 
2 + 2/x(2^-2)(vJ 



which gives another example of a nontrivial connection between and /x. For this process, 
the denominator has a term with /x2^ rather than /uNlogN for the neutral landscape. 
As such, this process is much more robust for even moderate N because the factor of 2^ 
dominates the behavior of the boundary states. In other words, the stationary distribution 
stabilizes as N increases very quickly as compared to the neutral landscape because the 2^ 
term dwarfs the other parameter contributions. Hence in general the interaction between 
N and /x differs significantly depending on the fitness landscape. This means that taking a 
limit (/X, N) (0, oc) to eliminate the effect of both drift and mutation is determined by the 
functional relationship between /x and N. Moreover, there may be significant implications for 
the commonly applied assumptions to methods like molecular clocks and the neutral theory 
of evolution - the rate of evolution depends on selection, mutation, and drift. For /x = 1/2, 
this is the same distribution as in equation [7| for which the entropy rate is as given above. In 
particular, this shows that even for situations that would be regarded as evolutionarily stable^ 
there can still be a significant amount of variation in the long-run behavior of the process 
in a finite population, and this phenomenon is captured by the entropy rate. Moreover, 
despite the large mutation rate /x = 1/2, the stationary distribution is still strongly centered 
on the center state (the distribution is binomial |11| and has standard deviation less than 
10 for N = 100, indicating that conventional wisdom regarding evolutionary instability due 
to large mutation rates is not universal. The dependence of the stationary distribution on 
mutation rate is dominated by the exponential dependence on population size. 

Another common game type is the prisoner's dilemma. Claussen and Traulsen consider 
such game having a Nash equilibrium at the state i = 0, defined bya = 3,6 = 0,c = 5,rf = 1 
pr] . They also note that Ti^2 = if self-interaction is not allowed, and assume a small 
mutation rate from state 1 to 2, which has to be done for the boundary mutation regime 
to have a stationary distribution. For the uniform regime, Ti^2 7^ and there is a unique 
stationary state. This example illustrates the classic mutation-selection balance. For small 
mutation probabilities, the stationary distribution concentrates at z = 0, but for larger rates 
the stationary distribution moves away from the state i = 0, with both types "surviving". 
This is also a case in which the uniform regime is both mathematically and realistically 
superior. 
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7. Discussion 

The entropy rate of both the Moran process and the Wright-Fisher process are bounded 
and less than the theoretical maximum entropy rate attainable by a Markov process on + 1 
states, where is the population size. The bound for the Wright-Fisher process depends on 
the population size and is approximately one-half the maximum theoretical value for large 
populations. For the Moran process, the bound is independent of the population size and 
is a much larger fraction of the theoretical maximum (94.6%). These results imply that the 
inherent randomness of evolutionary processes, in so far as they are modeled by the Moran 
process and the Wright-Fisher process, are fundamentally bounded for fixed population 
sizes. Moreover, as the proof in the appendix shows, the bound for the Moran process 
is a consequence of fitness proportionate selection. This means that either evolutionary 
processes are fundamentally ordered to some extent or that the processes considered here 
are not accurate models of evolutionary processes. Given the many applications of these 
models [22j [9j [23], this work concludes that there is both order and randomness in these 
processes that is characterized at least in part by the entropy rate. 

Intuitively, for all the processes considered, the entropy rate is maximal for neutral fitness 
landscapes; nevertheless, the highest entropy rate can occur for large populations when the 
other parameters are held constant, indicating sources of randomness in long-run popula- 
tion behavior can overcome those due to small population size. Though there are multiple 
approaches to mutation in the literature for the Moran process, the two approaches promi- 
nently discussed in this manuscript share the property that the entropy rate tends to zero 
as the mutation rate tends to zero. Mutation adds diversity to evolving populations, so it 
is also intuitive that the inherent randomness of population states is strongly dependent on 
mutation rates. Nevertheless, there can be very different relationships between the mutation 
rate and population size. As we have seen in cases where explicit calculations are possible, 
these parameters can interact directly or in a more complex manner, with the population 
size dominating the behavior in some cases and the mutation rate in others. Both interact 
significantly with the fitness landscape. 

Finally, let us consider the meaning of inherent randomness as an interpretation of entropy 
rate in this context. Processes with entropy rate approaching zero are those that fixate and 
occupy few states with significant probability. Relatively large entropy rates can occur from 
fiat landscapes and well-spread stationary distributions, or tight coupling between the states 
with high transition entropy and stationary distribution occupation. Hence randomness can 
be the result of movement between many population states and more frequent movement 
between a smaller number of states. The distributions around the most stable states |11|, 
in terms of the stability theory of evolutionary games, can have a significant impact on the 
entropy rate. Typically the entropy rates for the Wright-Fisher and A^-fold Moran process 
are similar, much closer in value, and larger, than the entropy rate of the Moran process, 
which is an intuitively "less random" process, consisting of many incremental shifts rather 
than generational sampling. 



Methods. All computations were performed with python code available at https : //github 
[com/marcharper/entropy.r ate . All plots created with matplotlib [24| . 
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8. Appendix 

Proof of Theorem 1. The entropy rate can be written as = SoH{{iLi^ 1 — SNH{{kii^ 1 — 
/c/i)) + Yl^=i ^iH{Ti). As fi ^ 0^ the first two terms converge to zero since i^((0, 1)) = 
and So + Sn 1, and the sum converges to zero since 5^ ^ for i 7^ 0, n. The latter 
holds because the transition probabilities depend at most linearly on /x, and so H{Ti) cannot 
prevent SiH{Ti) from converging to zero for < i < A as /x ^ 0. □ 

Proof of Theorem 2. The proof is essentially the same as for Theorem 1, except now we can 
argue that for z 7^ 0, A, the stationary probabilities Si (Equation [2]) have an additional factor 
of ji versus and because Ti^o = M T^^n-i = kji. Hence as before, 5o + ^ 1 
and 5i + . . . Sn-i ^ as /i ^ 0. Equations [2] and [3] imply the same fixation probabilities as 
the uniform mutation case [lOj. □ 

To prove the maximum entropy rate for the Moran process, we first start with a general- 
ization. Replace ifA{i) by ^A{i) and (A — i)fB{i) by ^B{i) to get the incentive dynamic in 
a finite population |25|: 



_ (fA{i){'^ - I^AB{i)) + (fB{i)l^BA{N -i) N 

J- i 



(9) 



(fA + ^B A 

^A{i)l^AB{i) + - Mba(A - i)) i 



(fA + ^B A 

Ti^i = 1 — Ti^i^i — Ti^i_i 

For particular choices of incentive function, one can replace the replicator incentive with 
that corresponding to another evolutionary dynamic, such as the incentives for the best 
reply, logit, Fermi, or other incentive. 

Theorem 3. For the incentive dynamics process defined above ^ the maximum entropy rate 
3 

is - log 2. 
2 ^ 

Proof For the boundary mutation regime, T^^^+i and T^^^-i are the result of multiplying two 
probability distributions component- wise, namely ( ^^-^cpB ^ ^a+^b ^ ^^w"^ Because of 
this internal relationship, the entropy rate is bounded lower than the theoretical maximum 
of log 3. To see this, consider more generally the first two terms of the Shannon entropy 
resulting from the component-wise product of two distributions (x, 1 — x) and (^, 1 — y): 

Eo = xylogxy + (1 - x)(l - y) log(l - x)(l - y). 

Eq is maximal when x = 1/2 = ^, but more generally maximal when x = y for all < x + y = 
c < 1. Combining this with the third term of the entropy corresponding to 1 — x^ — (1 — x)^ = 
2x(l — x), gives 

E = x^ logx^ + (1 - x)^log(l - xY + 2x(l - x)log2x(l - x), 
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which has a maximum of 3/2 log 2 when x = 1/2. This corresponds to the distribution 
(1/4,1/4,1/2) as seen earher in the text, and bounds the entropy rate of [9| The same 
argument apphes to an arbitrary mutation regime: the transitions are the product of the 
distributions (^, ^) and ( ^A{^){l-f^AB(^))^^B{^)f^BA{N-^) ^A(^)MAB(^)+^B(^)(l-MBA(iv-^)) ^ □ 
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